AritPIM: High-Throughput In-Memory Arithmetic
نویسندگان
چکیده
Digital processing-in-memory (PIM) architectures are rapidly emerging to overcome the memory-wall bottleneck by integrating logic within memory elements. Such provide vast computational power itself in form of parallel bitwise operations. We develop novel algorithmic techniques for PIM that, combined with new perspectives on computer arithmetic, extend this parallelism four fundamental arithmetic operations (addition, subtraction, multiplication, and division), both fixed-point floating-point numbers, using bit-serial bit-parallel approaches. propose a state-of-the-art suite algorithms, demonstrating first algorithm literature digital majority cases – including previously considered impossible PIM, such as addition. Through case study memristive we compare proposed algorithms an NVIDIA RTX 3070 GPU demonstrate significant throughput energy improvements.
منابع مشابه
Memory Efficient Arithmetic
In this paper we give an algorithm for finding the mth base-b digit of a positive integer n (m = 1 is the least significant digit) defined as the final number in a sequence of integers gotten by multiplying, adding, and subtracting previous numbers in the sequence (actually, the algorithm finds arbitrarily precise approximations to n/bm (mod 1), which can be used to get this mth digit whenever ...
متن کاملHigh-Throughput, Low-Memory Applications on the Pica Architecture
This paper introduces Pica, a fine-grain, message passing architecture designed to efficiently support high-throughput parallel applications. This focus on high-throughput applications allows a small local memory of 4096 36-bit words. The architecture minimizes overhead for basic parallel operations. An operand-addressed context cache and round-robin task manager allow single cycle task swaps. ...
متن کاملHigh-Throughput and Memory Efficient LDPC Decoder Architecture
Low-Density Parity-Check (LDPC) code is one kind of prominent error correcting codes (ECC) being considered in next generation industry standards. The decoder implementation complexity has been the bottleneck of its application. This paper presents a new kind of high-throughput and memory efficient LDPC decoder architecture. In general, more than fifty percent of memory can be saved over conven...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Emerging Topics in Computing
سال: 2023
ISSN: ['2168-6750', '2376-4562']
DOI: https://doi.org/10.1109/tetc.2023.3268137